The multiple de novo copy number variant (MdnCNV) phenomenon presents with peri-zygotic DNA mutational signatures and multilocus pathogenic variation

Background The multiple de novo copy number variant (MdnCNV) phenotype is described by having four or more constitutional de novo CNVs (dnCNVs) arising independently throughout the human genome within one generation. It is a rare peri-zygotic mutational event, previously reported to be seen once in every 12,000 individuals referred for genome-wide chromosomal microarray analysis due to congenital abnormalities. These rare families provide a unique opportunity to understand the genetic factors of peri-zygotic genome instability and the impact of dnCNV on human diseases. Methods Chromosomal microarray analysis (CMA), array-based comparative genomic hybridization, short- and long-read genome sequencing (GS) were performed on the newly identified MdnCNV family to identify de novo mutations including dnCNVs, de novo single-nucleotide variants (dnSNVs), and indels. Short-read GS was performed on four previously published MdnCNV families for dnSNV analysis. Trio-based rare variant analysis was performed on the newly identified individual and four previously published MdnCNV families to identify potential genetic etiologies contributing to the peri-zygotic genomic instability. Lin semantic similarity scores informed quantitative human phenotype ontology analysis on three MdnCNV families to identify gene(s) driving or contributing to the clinical phenotype. Results In the newly identified MdnCNV case, we revealed eight de novo tandem duplications, each ~ 1 Mb, with microhomology at 6/8 breakpoint junctions. Enrichment of de novo single-nucleotide variants (SNV; 6/79) and de novo indels (1/12) was found within 4 Mb of the dnCNV genomic regions. An elevated post-zygotic SNV mutation rate was observed in MdnCNV families. Maternal rare variant analyses identified three genes in distinct families that may contribute to the MdnCNV phenomenon. Phenotype analysis suggests that gene(s) within dnCNV regions contribute to the observed proband phenotype in 3/3 cases. CNVs in two cases, a contiguous gene duplication encompassing PMP22 and RAI1 and another duplication affecting NSD1 and SMARCC2, contribute to the clinically observed phenotypic manifestations. Conclusions Characteristic features of dnCNVs reported here are consistent with a microhomology-mediated break-induced replication (MMBIR)-driven mechanism during the peri-zygotic period. Maternal genetic variants in DNA repair genes potentially contribute to peri-zygotic genomic instability. Variable phenotypic features were observed across a cohort of three MdnCNV probands, and computational quantitative phenotyping revealed that two out of three had evidence for the contribution of more than one genetic locus to the proband’s phenotype supporting the hypothesis of de novo multilocus pathogenic variation (MPV) in those families. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-022-01123-w.


Illumina short-read sequencing
For family HOU3579, genome sequencing (GS) was performed on proband and parents' peripheral blood leukocyte-derived DNA at the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine through the Baylor-Hopkins Center for Mendelian Genomics initiative. 1 Sequencing libraries were prepared with KAPA Hyper reagents and pooled for multiplexed sequencing. The pooled libraries were sequenced using the Illumina HiSeqX platform, which generated 150 bp paired-end reads. After demultiplexing, an average of 127 Gb sequence data was generated per personal genome library. Post-sequencing data were computationally analyzed using the HGSC HgV pipeline 2,3 which executed base calling, read mapping (BWA-mem), merging of calls, variant calling (xAtlas), 4 post-processing, and quality control (QC) metrics collection for all sequencing events. Post-sequencing QC was performed with Fluidigm SNPtrace and Error Rate In Sequencing (ERIS) software to ensure sample identity and integrity. 5,6 For the other MdnCNV families (BAB3097, BAB3596, mCNV3/BAB9484, and mCNV7) and anonymized samples included, GS was performed using a separate protocol. The library was prepared using a PCR-free 550-bp insert size protocol by the KAPA Hyper Prep kit.
The library is subjected to sequence analysis on Illumina NovaSeq 6000 platform for 150 bp paired-end reads. The following quality control metrics of the sequencing data are generally achieved: average sequenced coverage over the genome > 40X, >97.5% target base (digital exome) covered at >20X. Data analysis and interpretation are performed by the Baylor Genetics analytics pipeline. The output data from the Illumina NovaSeq are converted from BCL files to FastQ files according to each sample's specific adapter sequence using Illumina's recommended procedure. FastQ data are aligned to the human reference genome build GRCh38

Clinical description of BAB9637, mCNV3/BAB9484 and BAB3097
BAB9637 is a 14-year-old male at the last clinical evaluation who was born at 38 weeks gestational age via Cesarean-section to a healthy 37-year-old mother (BAB9638) and 36-year-old father (BAB9639). The proband birth weight was 2.72 kg (Z= -1.36). Other neonatal anthropometric measurements were not reported. The neonatal period was complicated only by an inguinal hernia that was surgically repaired at 21 days of life. At 10 years of age, the proband was evaluated in the genetics clinic for a history of developmental delay (DD); the DD was a sporadic trait in the family. Notably, the parents reported that the child rolled over at 4 months of age, sat upright at 6.5 months, walked at 14 months, and spoke his first words as a 1year-old. However, he stopped talking at the age of 3 years old and only gained back some speech thereafter. Other specific developmental milestones were not reported. He failed to advance to grade 1 from kindergarten and required special education. By fifth grade, he was able to read, write, add, and multiply. Upon examination at 10 years old, he was found to have a height of 123.5 cm (Z= -3.01), a weight of 23.2 kg (Z= -2.73), and a head circumference (occipitofrontal circumference, OFC) of 52.4 (Z= -0.62). He was noted to have dysmorphic craniofacial features, including trigonocephaly, broad forehead, small widow's peak, hypertelorism with bushy eyebrows, protruding ears, and thin lips. Skin examination revealed large hyperpigmented macules. He was also noted to have proximally placed thumbs with small hands and feet, thenar and hypothenar atrophy, winged scapulae, genital hypoplasia, and thoracolumbar scoliosis. Throughout the examination, infrequent eye contact was noted, and his parents reported a prior clinical diagnosis of borderline autism.
Family history was notable for three siblings who were reported to have neurobehavioral differences: a 16-year-old sister, a 9-year-old brother, and a 6-year-old sister. The youngest sister was also reported to have autism and intellectual disability, as well as relative macrocephaly. The clinical CMA for the same sister was non-diagnostic. A high-resolution research CMA did not detect the dnCNV in any family members (Figure 1c). The mother additionally reported four spontaneous abortions.
BAB3097 was age five at the last evaluation and exhibited global developmental delay.
She was able to walk with aid starting at 4.5 years of age and had a vocabulary of 10 words. She was able to understand one-or two-part commands. She had chronic kidney failure (stage 2/3 with dysplastic kidneys) treated with calcitriol. In addition, her phenotype included strabismus, a ventricular septal defect that closed spontaneously, mild hearing loss (30 dB), constipation, and allergies with a presentation of atopic dermatitis or systemic allergic reaction. Her behavior was sociable, enjoying spending time with other children; however, she exhibited neophobia and needed time to acclimate to novel situations.
Dup, duplication; Pat, paternal; Mat; maternal. * Not reported in the clinical report due to lack of genes in the region. 0.24495963 0.09 a Rank ordered lin similarity score; b Gene combinations with more than 5% increasement of similarity score were listed.               Whole gene duplication OMIM/ORPHA description